As times change, so does the data collected during them. If you don't continually update your training datasets, you're models will eventually start to decay until they become completely irrelevant. This tutorial will cover the following learning objectives:
What is Model Drifting?
Different Types of Model Drifting
Common Causes of Model Drifting
How to Detect Model Drifting
What is Model Drifting?
Summary
Model Drifting is when your model's performance decreases in either a linear or nonlinear fashion.
Model Drifting can be contributed to numerous factors including irrelevant data, poor hyperparameters, or feature irrelevance.
Different Types of Model Drifting
Summary
A Covariate Shift occurs when there is a change in the distributions of one or more features. This can happen when data collected by user input changes due to seasonality, consumer trends, or product placement.
A Prior Probability Shift occurs when the distribution of the features remains the same but the distribution of the label changes. This is very common with classification models when users prefer other product categories, or consumers shift industries during a recession.
A Concept Shift occurs when the correlation between one or more features and the label change. This causes the output prediction to decay over time due to lack of statistical evidence. This commonly occurs in time-series forecast models when seasonality is not correctly identified or noted. There are three types of Concept Shifts:
A Gradual Concept Drift is a concept shift that happens over time and can be difficult to detect. An example of this is predicting web traffic for a startup who is slowly gaining more revenue, thus causing the relationship between the actual traffic feature and the estimated traffic label to change over time.
A Sudden Concept Drift is a concept shift that happens out-of-the-blue. This can be caused by a damatic event, such as bankruptcy, or a globl economic event, such as a Pandemic.
A Recurring Concept Drift is a concept shift that is caused by seasonality. For example, if you're trying to predict swimsuit sales and you build and train your model on only Summer sales data, what heppens when Winter comes around? Although seasonality is a common occurrence in time-series forecasting models, if it's not labeled correctly, the model's inference can be negatively impacted.
Just like with overfitting, underfitting can be caused by a lack of training data. If you don't have enough data present, your model will be too generalized to make a valid prediction.
Common Causes of Model Drifting
Summary
Sampling mismatch refers to the sample of your training data being fitted to your model for training purposes. Depending on your sampling strategy, bias could be present, leading to poor predictive outcomes.
When you copy and paste your algorithm, keeping the same hyperparameters, to a different dataset, the predictions are not going to be in sync, and leads to poor predictions since the model doesn't accurately fit the data.
Anomalies, or outliers, can appear both in the features and the label. These effect the distributions of the dataset as a whole and can skew the predictions toward the anomalous data point(s).
As mentioned before, seasonal effects that aren't labeled correctly can create bias in your training dataset.
There are a variety of data quality issues, such as incorrect input data, incorrect data procesisng steps, and duplications that can cause the model to produce a poor output or take the input and use it as a poor reference point for predictions.
How to Detect Model Drifting
Summary
The Kolmogorov-Smirnov (K-S) Test is a nonparametric, statistical test that takes the cumulative distribution of the training and evaluation datasets and comapres them. This test is used exclusively for numerical features. If you have categorical features, the chi-squared test can be used to detect data drift. This test has the following hypotheses:
H0 - Distributions are same in both datasets
H1 - Distributions are not the same.
In the Population Stability Index, the distribution of the evaluation dataset's label is compared to the training dataset that was used to create the model. The results of this test can be interpreted in the following ways:
PSI <= 1: This indicates that distributions of the two datsets have not changes or shifted.
0.1 < PSI < 0.2: This indicates that a modest modification or shift has taken place.
PSI > 0.2: This suggests that the distribution has changed significantly between the two datasets.
Detecting data drift between two populations (the training and evaluation datasets) can also be done using a ML model-based technique. When using this approach, you should consider the following:
The real-time data (new data points created after the model was deployed) should be labeled as 1, and the data used to train the current model in production should be labeled as 0.
You can then create a Logistic Regression model to predict the label. The evaluation metrics can be interpreted in the following ways:
High Accuracy, Good Distinction - No Drift Detected
Low Accuracy, Low Distinction - Drift Occurs
The Adaptive Windowing algorithm uses a sliding window approach to detect concept drift. The window size is fixed and the algorithm slides the fixed window for detecting any change in the newly arriving data. This is commonly used in fraud detection algorithms and real-time machine learning application where predictions need to be made rapidly. If a certain threshold is exceeded, an alaert can be generated by the system.